XML for the absolute beginner
A guided tour from HTML to processing XML with Java

Printer-friendly
version |
Mail this to a friend
Page 7 of 10
XSL: I like your style
People
who work in SGML and need to format it generally use DSSSL (Document Style
Semantics and Specification Language) to do the job. DSSSL is a dialect of
Scheme, itself a venerable and popular form of LISP (which stands either
for "List Processing" or lots of irritating, superfluous parentheses,"
depending on who you ask). Of course, if you're using DSSSL, you're
already an SGML god and veteran LISP hacker, and therefore should not be
reading in this article.
Fortunately, the W3C committees discussing style, HTML, and XML have
included in their design the Extensible Style Language, or XSL. XSL is
based on DSSSL (and DSSSL-O, the online version of DSSSL), and also uses
some of the style elements of CSS. It's simpler than DSSSL, while
retaining much of its power (much like the relationship between XML and
SGML). XSL's notation, however, may be surprising: it's XML. The simplest
way to say it is: XSL is an XML document that specifies how to transform
another XML document. Say, what?
Why XSL is so useful
XSL is immensely powerful. It
can be used to add structure to a document (as in CSS), and it can also
completely rearrange the input elements for a particular purpose.
For example, XSL can transform XML of one structure into HTML of a
different structure. (We'll see an example of this below.) XSL can also
restructure XML into other document formats: TeX, RTF, and PostScript.
XSL can even transform XML into a different dialect of XML! This may
sound crazy, but it's actually a pretty cool idea. For example, multiple
presentations of the same information could be produced by several
different XSL files applied to the same XML input. Or, let's say two
systems speak different "dialects" of XML but have similar information
requirements. XSL could be used to translate the output of the first
system into something compatible with the input of the second system.
These last few reasons are of special interest to Java programmers,
since XSL can be used to translate between different languages in a
distributed network of subsystems, as well as to format documents.
Understanding how to use XSL in simple applications, like transforming XML
to HTML, will help a Java developer understand XSL in general. Let's look
at an example of how to transform XML to HTML with an XSL style sheet.
Formatting XML as HTML: An example
An XSL file is a
series of rules, called templates, that are applied to an input
XML file. Each time a template matches something in the input, the
template produces a new structure in the output (often HTML, as in the
example we're about to see). The new structure is the XML's
content, with the appropriate style applied and arranged as the
XSL specifies. The templates in the XSL file are written in XML, using
specific tags with defined meanings.
The example below refers again to the XML recipe example in Listing 3.
We're going to look at an XSL file that transforms the XML in Listing 3
into the HTML in Listing 1.
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template
match="/Recipe">
<HTML>
<HEAD>
<TITLE>
<xsl:value-of
select="Name"/>
</TITLE>
</HEAD>
<BODY>
<H3>
<xsl:value-of
select="Name"/>
</H3>
<STRONG>
<xsl:value-of
select="Description"/>
</STRONG>
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
<!--
Format ingredients -->
<xsl:template
match="Ingredients">
<H4>Ingredients</H4>
<TABLE
BORDER="1">
<TR
BGCOLOR="#308030"><TH>Qty</TH><TH>Units</TH><TH>Item</TH></TR>
<xsl:for-each
select="Ingredient">
<TR>
<!-- handle empty Qty
elements separately -->
<xsl:if test='Qty[not(.="")]' >
<TD><xsl:value-of
select="Qty"/></TD>
</xsl:if>
<xsl:if
test='Qty[.=""]' >
<TD BGCOLOR="#404040">
</TD>
</xsl:if>
<TD><xsl:value-of
select="Qty/@unit"/></TD>
<TD><xsl:value-of
select="Item"/>
<xsl:if
test='Item/@optional="1"'>
<SPAN> --
<em><STRONG>optional</STRONG></em></SPAN>
</xsl:if>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</xsl:template>
<!--
Format instructions -->
<xsl:template
match="Instructions">
<H4>Instructions</H4>
<OL>
<xsl:apply-templates
select="Step"/>
</OL>
</xsl:template>
<xsl:template
match="Step">
<LI><xsl:value-of
select="."/></LI>
</xsl:template>
<!-- ignore
all not matched -->
<xsl:template match="*"
priority="-1"/>
</xsl:stylesheet>
Listing 7. XSL used as an XML language that
transforms XML into something else
(A printable version of this file is in example.xsl).
Looking at this code you'll notice, first of all, that the file starts
with the <?xml...?> tag, indicating that this file is
XML (even though it's also XSL). Each template is bounded by the tags
<xsl:template ...> and </xsl:template
...>. Every tag that begins with <xsl: is an XSL
command.
While we won't go over all the templates in the XSL file (since this
isn't an XSL tutorial), Listing 8 provides a quick look at the first
template in the file, just to get the general idea.
<xsl:template
match="/Recipe">
<HTML>
<HEAD>
<TITLE>
<xsl:value-of
select="Name"/>
</TITLE>
</HEAD>
<BODY>
<H3>
<xsl:value-of
select="Name"/>
</H3>
<U>
<xsl:value-of
select="Description"/>
</U>
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
Listing 8. The first template from the XSL style
sheet in Listing 7
Notice the <xsl:template> tag: It has an attribute
match="/Recipe". This indicates that this template is to be
applied when a <Recipe> element is encountered at the
input. Everything enclosed within this <xsl:template>
element will be placed in the output.
The XSL processor sees a <Recipe> element, so it
begins building its output by using the contents of the
<xsl:template> element in the XSL file. It adds an
<HTML> element, then a <HEAD>
element inside of that, and then a <TITLE> element.
It's actually building a new HTML document by creating HTML from
the template, based on what it sees. The <xsl:value-of>
tag instructs the XSL processor to go get the text contained in some other
element -- in this case, the sub element <Name>. Moving
a few lines down, you can see the same thing happening, as the XSL
processor again fetches and uses the same string within the
<H3> tag, and the <Description> tag
after it. (Note that we're using the same text in more than one place in a
document, something CSS simply can't do.) Finally, we come to the
<xsl:apply-templates> command, which tells the XSL
processor to apply all the other templates in the file to the input.
The resulting HTML is very similar to the HTML we saw in Listing 1. If
you want to study the XML, XSL, and resulting HTML, and want to learn how
to use XSL to format XML yourself, see the links on XSL in the Resources
section of this article.
Additional XSL capabilities
XSL isn't limited to
just producing HTML. XSL also has complete support for "native"
formatting, which doesn't rely on translation to some other format. Nobody
has yet implemented this part of XSL, though, primarily because page
formatting and layout is a very tough to wrangle. (There is, however, a
contest to implement all of XSL. See Resources.)
XSL's design also includes embedded scripting. Currently, IBM's
LotusXSL package (written in Java) provides the functionality of almost
all of the current draft specification of XSL, including the ability to
call embedded ECMAScript (the European standard JavaScript) from XSL
templates.
Of course, as always, with power comes complexity. Learning to write
XSL isn't a piece of cake. But the power's there if you want it.
XML is more than just content management
XSL, like
CSS, can be used on either the client or the server in a client/server
system. This fact provides immense flexibility and organization to Web
site designers and managers. So much so, in fact, that many people think
of XML, CSS, and XSL as another set of technologies for "content
management" for their Web sites. It makes styling Web documents easier and
more consistent, facilitates version control of the site, simplifies
multibrowser management (think of using a style sheet to overcome the many
differences between browsers), and so forth. CSS is also useful for
Dynamic HTML (which we'll discuss a bit below), where much of the
user interaction occurs on the client side, where it belongs. From the
point of view of people managing Web sites, XML, CSS, and XSL are indeed
big wins. And yet, there's a whole world of applications that have nothing
to do with browsers and Web pages. The map of that world is called the
Document Object Model.
Next
page >
Page 1 XML
for the absolute beginner
Page 2 HTML:
All form and no substance
Page 3 An
XML conceptual example
Page 4 Make
up a markup
Page 5 So,
what good is made-up markup?
Page 6 Cascading
Style Sheets: not just for HTML anymore
Page 7 XSL: I like your
style
Page 8 Modeling
information structure in XML
Page 9 XML
and Java
Page 10 Become
a tree surgeon!
Printer-friendly
version |
Mail this to a friend
Resources
There are so
many XML resources on the Web, I've had to categorize. The first section
here is the most useful, since the documents are either high-level
summaries or excellent link sites. Apologies to anyone who was omitted.
XML and Java: General XML resources
- "XML, Java and the Future of the Web," Jon Bosak. The paper that
started it all, at least from a Java programmer's point of view.
Definitely worth a read, even if it's a bit dated. Jon is commonly
considered to be the father of XML. Funny how all of these technologies
seem to have paternity:
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- "Media-Independent Publishing: Four Myths about XML" Jon Bosak:
http://metalab.unc.edu/pub/sun-info/standards/xml/why/4myths.htm
- Robin Cover's XML-SGML site is, according to my SGML buddies, the
bible of XML resources:
http://www.oasis-open.org/cover/
- The W3C's XML resource page lets you cheer from the sidelines as XML
technology proposals develop into recommendations, or join in the fray
on their active mailing lists:
http://www.w3.org/XML/
- OASIS, the Web site of the Organization for the Advancement of
Structured Information Standards, offers general news and information
about XML:
http://www.oasis-open.org/
- The Graphics Communications Association, host of the XTech '99
conference (March 11 to 13, 1999, San Jose, CA) and the upcoming XML
Europe '99 conference in Granada, Spain, (April 26 to 30, 1999) has a
Web site packed with XML information:
http://www.gca.org/
- XML.com is great for watching trends and digging up XML news:
http://www.xml.com/
- Textuality hosts Tim Bray's site. Check it out for a look at the
"big picture" of how XML fits into the structured document universe --
and for a look at Lark, Tim's nonvalidating XML processor:
http://www.textuality.com/
- The XML FAQ:
http://www.ucc.ie/xml/
- IBM's XML Website is an outstanding supplement to alphaWorks:
http://www.software.ibm.com/xml/index.html
XML and Java
- "XML and Java: The Perfect Pair" by Ken Sall (Internet.com, November
1998) provides information about XML, Java, and why these two are a
match made in heaven:
http://wdvl.com/Authoring/Languages/XML/Java/index.html
Tutorials and training
- Generally Markup, Richard Lander's Web site may be of interest to
you if you haven't yet read enough about markup languages:
http://pdbeam.uwaterloo.ca/~rlander/
- The Mulberry Technologies Web site is a good resource for commercial
training in XML, as well as general XML and SGML consulting by seasoned
SGML experts:
http://www.mulberrytech.com/
- The Web Developer's Virtual Library Series on XML offers good
summaries of various XML technologies, as well as annotated indices of
XML software:
http://wdvl.com/Software/XML
- Microsoft's Site Builder Network provides a series of articles
called "Extreme XML," one of which appears in the following link. While
some of it focuses on Microsoft-only, Windows-only technology, there's
still some great stuff here:
http://www.microsoft.com/sitebuilder/magazine/xml.asp
- Webmonkey has a good series of articles introducing readers to XML.
The index is at:
http://www.hotwired.com/webmonkey/xml/?tw=xml
- "What the ?xml!" by L.C. Rees offers an interesting take on XML and
why it's necessary -- nicely written and entertaining to boot:
http://www.geocities.com/SiliconValley/Peaks/5957/wxml.html
- "The XML Revolution" by Dan Connolly is a quick backgrounder on XML
(Nature):
http://helix.nature.com/webmatters/xml.html
Cascading Style Sheets
- W3C's CSS page will get your started learning about CSS:
http://www.w3.org/Style/CSS/
- "Cascading Style Sheets Designing for the Web" by Hakom Wium Lie and
Bert Bos (Addison-Wesley, 1997) Sample chapters from the book appear at:
http://www.awl.com/cseng/titles/0-201-41998-X/liebos/
Extensible Style Language (XSL)
- The W3C's XSL page:
http://www.w3.org/Style/XSL/
- Read (and comment on) the W3C's XSL Working Draft (currently dated
December 16, 1998):
http://www.w3.org/TR/WD-xsl
- "The Extensible Style Language: Styling XML Documents"
(WebTechniques Magazine) XSL tutorial information and examples:
http://www.webtechniques.com/features/1999/01/walsh/walsh.shtml
- Microsoft's XML and XSL tutorial site is especially interesting
because of the recent release of client-side XSL in Internet Explorer
5.0. Extensive and excellent:
http://www.microsoft.com/xml
- If you're still using IE 4.0, you can still experiment with XML,
using Microsoft's internal DOM:
http://www.microsoft.com/xml/articles/xmlmodel.asp
- If you want to experiment with XSL, try downloading IBM's LotusXSL.
It's all Java, and for the time being, it's free:
http://www.alphaworks.ibm.com/tech/LotusXSL
- Or, you can try James Clark's XT XSL engine, downloadable from:
http://www.jclark.com/xml/xt.html
Upcoming XSL contest
Though the details aren't yet worked out, Sun Microsystems will soon
announce a call for proposals for a $30,000 grant to develop a
client-side processor for full XSL implementation in Mozilla.
It will also announce, in conjunction with Adobe, a contest (first prize
$40,000, second prize $20,000) to develop a pure-Java, server-side
processor of the entire XSL language, to format XML to PDF (Adobe's
document format). Keep watching the Java Developer Connection (requires
free registration), and Mozilla sites for the eventual announcements.
- "XTech '99: Java and the XML wave" by Mark Johnson
(JavaWorld, April 1999) offers the most current information on
the contest:
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xtech.html
Simple API for XML (SAX)
- The definitive description of SAX is available online. You can also
download free SAX software here:
http://www.megginson.com/SAX/index.html
Document Object Model (DOM)
- The W3C information page for the Document Object Model appears on
the W3C site:
http://www.w3c.org/DOM/
- Among other things, you'll find the W3C Recommendation for DOM Level
1:
http://www.w3.org/TR/REC-DOM-Level-1/
- The Java bindings for DOM, for both XML and HTML, are in this
Recommendation appendix:
http://www.w3.org/TR/REC-DOM-Level-1/java-language-binding.html
- A great DOM tutorial by William Robert Stanek appears on PC
Magazine Online in "Object-Based Web Design." This tutorial
includes a discussion of using DOM with IDL, CORBA's Interface
Definition Language:
http://www8.zdnet.com/pcmag/pctech/content/17/13/tf1713.001.html
Dynamic HTML
- The Dynamic HTML Resource page contains several links to DHTML
articles:
http://www.hotwired.com/webmonkey/dynamic_html/?tw=dynamic_html
Software
- Epicentric, Inc.:
http://www.epicentric.com/
- More XML (and other Java) technology than you can shake a stick at
is available at IBM's alphaWorks:
http://alphaworks.ibm.com/
- Version 2 of IBM's excellent XML parser package, xml4j, is available
for download. This package includes several parsers, both validating and
nonvalidating:
http://www.alphaworks.ibm.com/tech/xml4j
- See also IBM's exciting Bean Markup Language project, which uses XML
to represent and manipulate JavaBeans:
http://www.alphaworks.ibm.com/tech/bml
- Another free Java XML parser was written by the indefatiguable James
Clark, download at:
http://www.jclark.com/xml/xp/index.html
- XEENA is IBM alphaWorks's DTD-guided XML editor. You want it, you
need it, you gotta have it:
http://www.alphaworks.ibm.com/tech/xeena
- Mozilla.org is the open source community's effort to extend the
Netscape source code. Find out about it at:
http://www.mozilla.org/
- Information about XML and CSS in Mozilla appears at:
http://www.mozilla.org/rdf/doc/xml.html
- You can read about Sun's XML and Java initiatives at:
http://www.sun.com/990310/java_xml.jhtml
- In addition, Java Project X includes source code downloadable from:
http://developer.java.sun.com/developer/earlyAccess/xml/index.html
- ArborText has a suite of sophisticated tools for editing SGML, XML,
and XSL:
http://www.arbortext.com/Products/products.html
- Oracle8i from Oracle corporation uses XML inside the Oracle core:
http://www.oracle.com/xml/
- Download Oracle's free XML for Java parser:
http://technet.oracle.com/direct/3xml.htm
- Microsoft's Internet Explorer 5.0, released this month, implements
part of the XSL spec. You can find it on Microsoft's Web site -- and
also just about anywhere else:
http://www.microsoft.com/windows/ie/default.htm
- You can also download a beta release of Microsoft's XML Notepad
editor (limited to running only on Microsoft Windows):
http://www.microsoft.com/xml/notepad/download.asp
- Vervet Logic of Bloomington, IN, has announced XML <PRO>, a
commercial XML editor:
http://www.vervet.com/
- Majix, to transform XML to HTML via XSL, is available at:
http://www.tetrasix.com/
- If your French is rusty, you might want to try the English-language
site at:
http://www.tetrasix.com/english/default.htm
History
- Read about the history of HTML here. It's part of an online book, so
there's no telling for how long it will be available:
http://ei.cs.vt.edu/~wwwbtb/hardcopy/book/chap4/origins.html
The
two chapters listed below (of the book "HTML Unleashed" by Rick Darnell,
et al., also cover some of the technical background of these languages.
- SGML history
http://www.webreference.com/dlab/books/html/3-2.html
- XML history (such as it is):
http://www.webreference.com/dlab/books/html/38-0.html
- Nothing to do on Friday night? Why not read up on the history of
SGML? Charles Goldfarb, considered by many to be the "father of SGML,"
reminisces publicly at:
http://www.sgmlsource.com/Goldfarb/history/index.htm
- Useful XML and SGML information appears at Goldfarb's Web site,
including a comprehensive XML book list:
http://www.sgmlsource.com/
Miscellaneous links
- Uche Ogbuji has written an interesting article in
LinuxWorld about using XML on Linux in the Enterprise. It's at:
http://www.linuxworld.com/linuxworld/lw-1999-03/lw-03-xml.html
- Bluestone Software has recently made a splash with pure-Java XML
application servers, and a freely downloadable Swing package called
XwingML:
http://www.bluestone.com/
- Everyone (except Microsoft) is pretty freaked out about the US
Patent Office awarding Microsoft a patent for certain kinds of
functionality in style sheets. What happens with this patent, and its
impact on developing technology, remains to be seen. Judge for yourself
by reading the patent at:
http://www.patents.ibm.com/patlist?icnt=US&patent_number=5860073
- The title of the sample recipe is actually the title of a very funny
song by William Bolcom. Similar recipes may be found at:
http://www.b4uby.com/granny/gsoup.htm
- The song appears on a compact disc (with other odd songs) available
from the Public Radio Music Source at:
http://75music.org/best/docs/keepers.htm